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Abstract 





BACKGROUND: Competing risks arise when the subject is exposed to more than one cause of 
failure. Data consists of the time that the subject failed and an indicator of which risk caused the 
subject to fail. 



METHODS: With three approaches consisting of Fine and Gray, binomial, and pseudo-value, all 
of which are directly based on cumulative incidence function, cardiovascular disease data of the 
Isfahan Cohort Study were analyzed. Validity of proportionality assumption for these 
approaches is the basis for selecting appropriate models. Such as for the Fine and Gray model, 
establishing proportionality assumption is necessary. In the binomial approach, a parametric, 
non-parametric, or semi-parametric model was offered according to validity of assumption. 
However, pseudo-value approaches do not need to estabUsh proportionality. 

RESULTS: Following fitting the models to data, slight differences in parameters and variances 
estimates were seen among models. This showed that semi-parametric multiplicative model and 
the two models based on pseudo-value approach could be used for fitting this kind of data. 

CONCLUSION: We would recommend considering the use of competing risk models instead of 
normal survival methods when subjects are exposed to more than one cause of failure. 
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Introduction 

Problems involving competing risks are common in 
medical researches, where (K > 0) competing causes 
of failure may occur. Occurrence of any of the risks 
causes failure or death and precludes the occurrence 
of other competing risks. ^-^ For such data one 
observes only the failure time and a cause of failure 
for each subject in the study. Methods for 
estimating the probability of failure for events that 
are subject to competing risks are not new. It is still 
quite common to see inappropriate methods used to 
estimate such probabilities for endpoints that suffer 



from competing risks. ^ 

Generally, two types of analysis can be 
performed when competing risks are present; 
modeling cause-specific and sub-distribution hazard 
or cumulative incidence function. ^'"^ The Cox 
regression modeling for each event is an example of 
the first type. In such a model a subject who has 
failed in other competing risks is treated as a 
censored subject. This method is valid if the 
censoring distributions are independent.-^ Multi-state 
models that do not require the existence of potential 
failure times and Aalen additive hazards model are 
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Other examples of the first type of modeling.'"'^ 
Klein modeled covariate effects using this methods 
For the second type, we can find the Fine and Gray** 
method, the binomial approach suggested by 
Scheike and Zhang,'^ and the pseudo-value approach 
suggested by Klein and Andersen.'".'* These 
approaches are introduced in section 3. We fitted 
these three methods to cardiovascular diseases 
(CVD) data of the Isfahan Cohort Study (ICS) 
introduced in section 2.4}^-^^ In section 3 We 
present the results, and in section 4 findings are 
discussed in brief 

Materials and Methods 

The most common model for competing risks is 
in terms of potential failure times, where K is 
competing risks denoted by Di,...,Dk, and for each 
risk there is a potential failure time of Xi, i=l,...,K. 
One observes T = min(Xi,. . .,Xk) and a variable 8 = 
j, j = 1,...,K, 

Where T = X, defines which of the risks caused 
the event to occur. Competing risk probabilities can 
be summarized by cumulative incidence function 
for the j* competing risk. This function is defined 
as probability of experiencing risk j prior to time t in 
the presence of all competing risks. This quantity 
depends on all the cause-specific hazard rates 
(hi(t) = l,...,k), not just the crude hazard rate of 
cause of interest.' 

(1) 

t ^ X 

Fj(t) = P[T<t,£=i] = f hj(x)exp{-y [ hi(u) 

When there is a covariate, it is common in 
medical sciences to study the effect on competing 

solution is a direct 
cumulative incidence 
function. Here, we discuss three approaches that 
focus on this topic. 
Fine and Gray Model 

The first approach suggested by Fine and Gray** is a 
proportional sub-distribution hazards model with: 
(2)Y(t,Z)=Yo(t)exp[p'Z] 

Where y and yo are hazard and baseline hazard 
of the sub-distribution, Z and P are vectors of 
covariates and coefficients, respectively. The partial 
likelihood is given by: 



risks quantities.''*-'** One 
regression modeling of 



(3) m = nu 



Rj = {j: tj > tj or (tj<tj and the subject had competing risk event)] 

The risk set Ri is formed of those who did not 
experience an event by time t and those who 
experienced a competing risk event by time t. Thus, 
those who experienced other types of events remain 



in the risk set all the time. The weights are defined as: 

f4\ - G(ti) 

"ij G(min(t,tj)) 

Where ^ is the Kaplan-Meier estimate of 
survivor function of the censoring distribution. 
This model is valid if the proportionality 
assumption is established. 
Binomial Approach 

The second method is the direct binomial approach 
suggested by Scheike and Zhang'-* which models 
cumulative incidence function by a general class of 
models given by: 

(5) h{Fi(t,z)} = g{il(t),p,z} 

Where h and g are the known link and 
regression functions, respectively, r|(t) is the 
unknown regression function and p is the vector of 
regression parameters. We use the semi-parametric 
multiplicative model: 

(6) clnln{l-Fi(t;x,z)} = tl(t)'x-Fp'z 

Where X is a (p+1) -dimensional (X = 
(l,xi,. . .,Xp)), and Z a q-dimensional covariate. These 

flexible models allow covariate ^ to have time- 
varying effects and the covariate Z to have constant 
effects: 

The model suggests testing the hypothesis that a 
specific covariate Xj has a constant effect over time 
and define hypothesis Ho: r|i(t)= TJ. This leads to a 
very useful goodness-of-fit test for model 
validation. The test shows exactly where non- 
proportionality is present. This approach is to start 
out with a model where all effects initially have 
parametric or non-parametric effects, and then 
reduce model complexity by successive testing to 
find an appropriate semi-parametric model that fits 
the data. In brief, for this approach, the model is 
chosen according to proportionality assumption. 
Pseudo-value Approach 

The third method of direct modeling of the 
cumulative incidence function is based on a pseudo- 
value approach." For this model a grid of time 
points Ti,...,Tm is selected. At each grid point, the 
estimated cumulative incidence function is 
computed based on the complete data set F(Th) and 
the estimated cumulative incidence function based 
on the sample of size n-1 obtained by deleting the 
i* observation F'-'-'(tjj) then the pseudo-value for 
the i''' subject at time Xh is defined as: 

(8) = nF(Th) - (n - 1)F« (t^), i = 1 n, 

h = 1, There are the pseudo-values known from 
jack-knife techniques. nF(t) is the number of events 
of type of interest occurring prior to t. When there 
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is no censoring. In this case 9; = (9ih, h = 

1,...,M) = (I(Ti < Ti,£i = 1) I(Ti <TM,Ei = 1)) 

and Gi's are independent. When we have censoring, 
because pseudo-values are close to the indicators 
they are approximately independent. This allows us 
to make use of results from generalized linear 
models to model the effects of covariates. 

(9) g(Gih)=ah+Y'Zi=|3'Zih, i = 1 n, h = 1 M 

Where g(0) is a link function. The possible 
choices could be the logit link g(x) = log(x/(l-x)), 
or complementary log-log function g(x) = -log(- 
log(l-x)) on X. Unlike the Fine and Gray model, this 
approach does not need to establish proportionality 
assumption. To select the appropriate link function, 
one crude way, when the factor is categorical, is to 
look at plots of differences in transformed estimates 
of the cumulative incidence functions for each 
category from the baseline category. 

For two categorical factors, the cumulative 
incidence functions for two groups (ignoring other 
covariates), is estimated separately. Then, g(Fih(t))- 
g(Fio(t)) is plotted, here Fio(t) and Fih(t) are the 
estimated cumulative incidence function for 
baseline and other categories, respectively, and g(0) 
is either the logit or complementary log-log 
transforms. If the link chosen for the plot is 
correct, then the curves should approximately 
be horizontal. 
Data 

To compare these three approaches, we used the 
data of the Isfahan Cohort Study. The ICS is a 
community-based, ongoing longitudinal study on 
6504 adults aged 35 and older at baseline, aiming at 
Iranian cardiovascular disease risk chart. 
Participants lived in both urban and rural areas of 
three cities and their associated district villages in 
central Iran (Isfahan, Arak, Najafabad). Several 
risk factors for cardiovascular disease, like smoking 
status, lipids, blood pressure, and anthropometric 
measurements, were measured at baseline. They 
were followed for 5 years from January 1997 to 
September 2001. End of study for each subject 
was confirmed if one of the cardiovascular disease 
events (CVD) (non-fatal myocardial infarction, 
fatal myocardial infarction, non-fatal stroke, fatal 
stroke, sudden cardiac death, and unstable angina) 
occurred or the subject experienced unrelated 
CVD death. Finally, data of 5515 participants who 
had at least one follow-up time after baseline were 
included in analysis. There is one competing risk 
of CVD event (event of interest), and it has 
occurred when the subject experienced unrelated 
CVD death.12-19 



Results 

From 5515 (2815 females and 2700 males) cases in 
ICS data, 5.13% had one of the mentioned CVD 
and 1.5% experienced unrelated CVD death. The 
study consisted of patients with non-fatal 
myocardial infarction (n = 52), fatal myocardial 
infarction (n = 19), sudden cardiac death (n = 46), 
non-fatal stroke (n = 40), fatal stroke (n = 14), and 
unstable angina (n = 112). Moreover, 2133 subjects 
were 35 to 44 years old, 2449 between 45 to 64, and 
933 were 65 and older at baseline. 

To fit ICS data with R software, the 3 Fine and 
Gray, binomial, and pseudo-value competing risks 
approaches, which are directly based on cumulative 
incidence function were used.'-^-^'''^! As is common 
in medical literature, parametric models have been 
studied first. Table 1 shows the results. The Fine 
and Gray model has maximum number of 
significant covariates (8) and the lowest variances. 
On the contrary, multiplicative models have 
minimum number of significant covariates (6) and 
the most variances, and 7 covariates are significant 
in logit and complementary log-log models. In the 
Fine and Gray model, except for abdominal obesity 
(P = 0.76) and high low-density lipoprotein 
cholesterol (high LDL-C) (P = 0.20), other 
covariates are significant (P < 0.05). For the 
multiplicative model, age, abdominal obesity, 
hypertension, diabetes mellitus, and current 
smoking status are significant (P < 0.05). In logit 
and complementary log-log models, age, 
hypertension, high LDL-C, low high-density 
lipoprotein cholesterol (low HDL-C), diabetes 
mellitus, and current smoking status are significant 
(P < 0.05). Slight differences among the models are 
seen for parameter estimates. In addition, for the 
Fine and Gray logit and multiplicative models, we 
can interpret exp(b) as the odds in favor of the 
categories of a factor relative to the baseline 
category. Table 2 shows the results of fitting of 
non-parametric multiplicative model. These models 
differ from parametric models, because their 
coefficients have time-varying effects. This table 
also shows the results of testing goodness-of-fit or 
constant effect test. Age (65 years and older), 
abdominal obesity, and diabetes mellitus are 
significant (P < 0.05). This implies that Fine and 
Gray, parametric and non-parametric multiplicative 
models are not appropriate, because the 
proportionality assumption is violated. Therefore, 
fitting the semi-parametric model is necessary and 
allows the covariates with constant and non- 
constant effects to be presented simultaneously in 
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the model. We use this model later to predict 
cumulative incidence function for specific subjects. 
Table 3 shows semi-parametric model results. For 
this model, age (65 years and older), abdominal 
obesity, and diabetes melHtus do not have parameter 
estimates, because of their non-constant effects in 



time. Figure 1 shows goodness-of-fit plot for 
hypertension with two logit and complementary log- 
log transforms. The two plots are approximately 
horizontal; meaning that both are suitable. Because of 
differences in variance estimation between these two 
models, the complementary log-log model is preferred. 



Table 1. Results of fitting parametric models on Isfahan Cohort Study (ICS) data 



Covariate 




Fine and Gary 
model 


. , Complementary log-log 
Logit model j , ^ \^ ,T-. 
^ model on l-Fi(t) 


Multiplicative 
model 


Sex 


B 


0.482 


0.320 


0.265 


0.186 




SE (b) 


0.146 


0.191 


0.180 


0.221 




P 


(0.001)* 


(0.094) 


(0.142) 


(0.400) 


Age"' 45-64 


B 


0.828 


0.790 


0.770 


1.190 




SE (b) 


0.188 


0.252 


0.246 


0.276 




P 


(< 0.001)' 


(0.002)' 


(0.002)' 


(< 0.001)' 


>65 


B 


1.475 


1.438 


1.372 


1.900 




SE (b) 


0.198 


0.259 


0.251 


0.278 




P 


(< 0.001)' 


(< 0.001)' 


(< 0.001)' 


(< 0.001)' 


Abdominal obesity 


B 


-0.04 


-0.165 


-0.168 


-0.460 




SE (b) 


0.151 


0.200 


0.188 


0.244 




P 


(0.760) 


(0.409) 


(0.372) 


(0.050)* 


Hypertension 


B 


0.980 


1.154 


1.099 


1.190 




SE (b) 


0.129 


0.158 


0.150 


0.202 




P 


(< 0.001)' 


(< 0.001)' 


(< 0.001)' 


(< 0.001)* 


High LDL-C 


B 


0.455 


0.412 


0.381 


0.194 




SE (b) 


0.124 


0.163 


0.154 


0.200 




P 


(< 0.001)' 


(0.012)' 


(0.013)' 


(0.313) 


Low HDL-C 


B 


0.162 


0.376 


0.353 


0.336 




SE(b) 


0.153 


0.168 


0.157 


0.210 




P 


(0.200) 


(0.025)' 


(0.024)* 


(0.109) 


Diabetes mellitus 


B 


0.592 


0.600 


0.513 


0.733 




SE (b) 


0.153 


0.191 


0.177 


0.225 




P 


(< 0.001)' 


(0.002)' 


(0.004)' 


(0.001)* 


Hypertriglyceridemia 


B 


0.340 


0.253 


0.233 


0.119 




SE (b) 


0.137 


0.177 


0.167 


0.224 




P 


(0.013)' 


(0.153) 


(0.163) 


(0.597) 


Smoking 


B 


0.391 


0.585 


0.533 


0.607 




SE (b) 


0.153 


0.198 


0.184 


0.233 




P 


(0.010)' 


(0.003)' 


(0.003)' 


(0.009)* 



* Significant at a — 0.05 level; ** Females are reference group; Age between 35 and 44 are reference group 
SE: Standard error; LDL-C: Low-density lipoprotein cholesterol; HDL-C: High-density lipoprotein cholesterol 



Table 2. P-values for non-parametric model on Isfahan Cohort Study (ICS) data 



Covariate 


Multiplicative Model 


Ho: il(t)=0 


Hq: Constant effect 


Sex 


0.358 


0.264 


Age 45-64 


< 0.001* 


0.280 


> = 65 


< 0.001* 


< 0.001' 


Abdominal obesity 


0.002' 


0.016' 


Hypertension 


<o.oor 


0.096 


High LDL-C 


0.170 


0.508 


Low HDL-C 


0.118 


0.490 


Diabetes mellitus 


<0.00l' 


0.024' 


Hypertriglyceridemia 


0.240 


0.578 


Smoking 


0.012' 


0.084 



* Significant at a — 0.05 level 

LDL-C: Low-density lipoprotein cholesterol; HDL-C: High-density lipoprotein cholesterol 
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Table 3. Results of fitting semi-parametric model on Isfahan Cohort Study (ICS) data 



Covariate 






Multiplicative Model 






b 


SE (b) 


P 


Sex 




0.142 


0.225 


0.527 


Age 


45-64 


1.090 


0.225 


<o.oof 




>65 






<o.oor 


Abdominal obesity 








< 0.001" 


hypertension 




1.190 


0.201 


< 0.001" 


High LDL-C 




0.213 


0.202 


0.292 


Low HDL-C 




0.375 


0.225 


0.081 


Diabetes mellitus 








< 0.001" 


Hypertriglyceridemia 




0.110 


0.236 


0.640 


Smoking 




0.635 


0.234 


0.006" 



* Significant at a = 0.05 level; 

SE: Standard error; LDL-C: Low-densit\' lipoprotein cholesterol; HDL-C: High-density lipoprotein cholesterol 
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Figure 1. Difference in cumulative incidence function for logit and complementary log-log transform in hypertension 
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Sometimes it is important to get an idea of the 
cumulative incidence probability for specific 
patients. Therefore, computing the predicted 
cumulative incidence function for a given set value 
of covariates is very popular.22.23 Por example, 
suppose that physicians want to know the value of 
cumulative incidence function for male patients 
older than 65 with abdominal obesity, hypertension, 
high LDL-C, low HDL-C, diabetes meUitus, 
hypertriglyceridemia, and smoking. Figure 2 shows 
the predicted cumulative incidence function during 
60 months for two appropriate complementary log- 
log and semi-parametric multiplicative models. The 
predicted values for the first model are less than the 
second model for about 35 months (between the 
15*-58* months). 

Discussion 

Data from studies with competing risks outcomes 
present challenges to the data analyst. Some articles 
analyze such data with normal survival models. A 
criticism that can be leveled at these models is the 
assumption that upon removal of one cause of 
failure, the risk of failure from remaining causes is 
unchanged. In human studies this assumption is 
rarely true. -^-5 Here we have used three approaches 
(Fine and Gray, binomial, and pseudo-value 
approaches) which are based directly on the 
cumulative incidence function and their validity 
depends on proportionality assumption. This 
collection of models gives a rich variety, from which 
a user can choose an appropriate model for 
analyzing the data. 

We saw that the Fine and Gray, parametric 
multiplicative model was not able to describe the 
cumulative incidence function for ICS data. This 
model's lacking flexibility was found using the 
goodness-of-fit approach. This showed that its non- 
proportionality can primarily be attributed to the 
effect of covariates. A similar conclusion was 
reached for the non-parametric multiplicative 
model. The semi-parametric multiplicative model 
could be a good choice for this data. With the 
pseudo-value approaches, two link functions were 
used in GLM model (logit or complementary log- 
log function). Unlike the Fine and Gray and 
multiplicative models, this is more flexible so that 
we do not need to assume proportionality. 
Goodness-of-fit plots showed that both link 
functions are suitable for hypertension groups, but 
they were different in variance estimation. 
Moreover, it seems the complementary log-log 
function is more appropriate. Predictions plot for 



ICS data using semi-parametric multiplicative and 
complementary log-log models were quite similar 
during 5 years, but slight differences in parameters 
regression were found between the two models. 

Conclusion 

Inappropriate statistical methods are not rare in 
binomial literature.'' The competing risk problem is a 
critical issue in survival analysis. We would 
recommend considering competing risk models 
instead of simply using normal survival methods 
when subjects are exposed to more than one cause of 
failure. In future studies like ICS, using competing 
risks models is suggested, because a large number of 
unrelated CVD deaths wiU occur during years of 
follow-up and the use of normal survival functions 
can lead to incorrect or at least imprecise estimates. 
As we described, the two appropriate semi- 
parametric multiplicative and complementary log-log 
models are proposed for fitting of such data. 
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